Learning from Noisy Data using Hyperplane Sampling and Sample Averages

نویسندگان

  • Guillaume Stempfel
  • Liva Ralaivola
  • François Denis
چکیده

We present a new classification algorithm capable of learning from data corrupted by a class dependent uniform classification noise. The produced classifier is a linear classifier, and the algorithm works seamlessly when using kernels. The algorithm relies on the sampling of random hyperplanes that help the building of new training examples of which the correct classes are known; a linear classifier (e.g. an SVM) is learned from these examples and output by the algorithm. The produced examples are sample averages computed from the data at hand with respect to areas of the space defined by the random hyperplanes and the target hyperplane. A statistical analysis of the properties of these sample averages is provided as well as results from numerical simulations conducted on synthetic datasets. These simulations show that the linear and kernelized versions of our algorithm are effective for learning from both noise-free and noisy data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Misclassification and Margin based SVM Active Learning Algorithm for Audio Event Detection

Audio event detection has become a hot research due to its wide applications in many fields, such as multimedia retrieval etc., the detection needs large amounts of labeled samples to train the audio event models, but in real life, the labeled samples are expensive to obtain, the shortage of such labeled samples is a big obstacle. Active learning is an efficient way to deal with the problem of ...

متن کامل

Noise-enhanced convolutional neural networks

Injecting carefully chosen noise can speed convergence in the backpropagation training of a convolutional neural network (CNN). The Noisy CNN algorithm speeds training on average because the backpropagation algorithm is a special case of the generalized expectation-maximization (EM) algorithm and because such carefully chosen noise always speeds up the EM algorithm on average. The CNN framework...

متن کامل

Data Mining from Noisy Learners

In this paper we discuss issues related to data mining from a noisy database such as what might be generated by a machine learning system. We describe an approach for estimating joint probability distributions of the noise-free case in terms of noisy observables and conditional probabilities which can be estimated using statistical sampling and error analysis. Several experiments are presented ...

متن کامل

Sweep-Hyperplane Clustering Algorithm Using Dynamic Model

Clustering is one of the better known unsupervised learning methods with the aim of discovering structures in the data. This paper presents a distance-based Sweep-Hyperplane Clustering Algorithm (SHCA), which uses sweep-hyperplanes to quickly locate each point’s approximate nearest neighbourhood. Furthermore, a new distance-based dynamic model that is based on 2N -tree hierarchical space partit...

متن کامل

An Approximate Analytical Approach to Resampling Averages

Using a novel reformulation, we develop a framework to compute approximate resampling data averages analytically. The method avoids multiple retraining of statistical models on the samples. Our approach uses a combination of the replica “trick” of statistical physics and the TAP approach for approximate Bayesian inference. We demonstrate our approach on regression with Gaussian processes. A com...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008